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ABSTRACT 

With the rapid development of economy in China over the 
past decade, air pollution has become an increasingly seri¬ 
ous problem in major cities and caused grave public health 
concerns in China. Recently, a number of studies have dealt 
with air quality and air pollution. Among them, some at¬ 
tempt to predict and monitor the air quality from different 
sources of information, ranging from deployed physical sen¬ 
sors to social media. These methods are either too expensive 
or unreliable, prompting us to search for a novel and effec¬ 
tive way to sense the air quality. In this study, we propose to 
employ the state of the art in computer vision techniques to 
analyze photos that can be easily acquired from online social 
media. Next, we establish the correlation between the haze 
level computed directly from photos with the official PM 
2.5 record of the taken city at the taken time. Our experi¬ 
ments based on both synthetic and real photos have shown 
the promise of this image-based approach to estimating and 
monitoring air pollution. 

Categories and Subject Descriptors 

1.4.8 [Scene Analysis]: Miscellaneous; 1.5.4 [Applications]: 
Computer Vision 

Keywords 

Air Quality, Haze Level, User Generated Photos, Image An¬ 
alytics 

1. INTRODUCTION 

Air pollution is one of the major environmental side prod¬ 
ucts caused by moderm industrialization. First step to con¬ 
trol air pollution is to monitor the air quality and raise the 
awareness among people. Airborne Particulate Matter is 
one kind of air pollutant transmitting hazardous chemicals, 
which penetrate deeply into human lung and blood, causing 
many healthy problems [10]. PM2.5/Haze, a finest kind of 
Airborne Particulate Matter, has recently attracted much 
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Figure 1: The overview of the proposed framework. Given 
a photo, we first estimate the transmission matrix using the 
Dark Ghannel Prior (DGP) [4]. In parallel, we estimate the 
depth map based on the Deep Gonvolutional Neural Fields 
(DGNF) [7|. By combining the transmission matrix and 
depth map, we estimate the haze level of the photo. 


attention among people living in large cities in Ghina, such 
as Beijing, because it has been the major air pollutant since 
the government began to publish the PM2.5/Haze data in 
2012. In this paper, we propose a system to estimate haze 
level based on single photo. 

While an accurate air quality sensor network has been 
established across the world, there are multiple advantages 
to use a photo to estimate the haze level: 1) Sensors are 
expensive and therefore the coverage is limited. According 
to an official real time air quality data platforrrQ? there are 
only 12 monitor stations for the giant Beijing city. Also, 
many cities and rural areas have no monitor stations at all. 
Therefore, haze monitoring using the ubiquitous online pho¬ 
tos can serve as an information source complementary to 
official data. 2) Haze estimation from a photo will enable 
mobile phone users to snap a photo and measure air quality. 
The micro level information, in contrast to the macro level 
metrics, such as Air Quality Index, is especially valuable for 
individuals. 

Although it seems simple for bare eyes, estimating haze 
level automatically using photo is challenging, partly due to 
the large visual variations of the scenes, different photogra¬ 
phy skill levels of the mobile users, and even various photo 
resolutions. Our solution to this problem is illustrated in 
Fig.m We first estimate a transmission matrix generated 
from a haze removal algorithm, and estimate the depth map 
for all pixels in the photo. A haze level score is computed 
by combining the transmission matrix and depth map, and 
can be calibrated to estimate the PM2.5 level. We consider 
the transmission matrix as the perceived depth of hazy pho- 
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tos, which is a combination of actual depth and haze effects. 
Therefore, by ruling out the actual depth factor, we can iso¬ 
late the haze effects from the transmission matrix, which is 
used to estimate the haze level. 

We make the following contributions in this paper, 

• We propose an effective method to estimate the haze 
level from photo. 

• We augment an existing haze removal benchmark for 
haze level estimation research. 

• We collect a large scale dataset with more than 8,000 
photos associated with PM2.5 data. Along with a syn¬ 
thetic image dataset, the real world data helps validate 
the effectiveness of our image-based approach. 

2. RELATED WORK 

Share the similar motivation to provide information source 
complementary to official data, there are several previous 
methods on using auxiliary data to monitor air quality. For 
example, [T] proposed to install sensors on city street sweep¬ 
ers to monitor air quality in San Fransisco. [3] proposed 
to integrate social media and official records to monitor air 
quality and predict health hazardous. [8] proposed to iden¬ 
tify keywords on Weibo (Chinese version of Twitter) to track 
city level Air Quality Index. [9] also proposed to estimate 
visibility/haze level based on photo. Our approach is dif¬ 
ferent from [9] in that 1) [9] assumes manually segmented 
sky regions. 2) [9] needs camera calibration and other sen¬ 
sors, such as accelerometers and magnetometers, to calibrate 
the luminance. However, our method does not have these 
restrictions. [6] also proposed to estimate haze level using 
photo and their method is based on statistics computed di¬ 
rectly from image pixels and therefore is most related to our 
method. We compare with [6] in our Experiments, which 
show that our method is superior in the presence of com¬ 
plex scenes and haze conditions. 

3. PROPOSED METHOD 

Fig. [1] illustrates the framework of the proposed method. 
There are three major components: transmission matrix es¬ 
timation, depth map estimation and haze level estimation. 

3.1 Haze Model 

Following m, the imaging process of a photo taken under 
haze condition is modeled by the following equation, 

L(x) = Lo(x)i(x) + Ls(x)(l - i(x)) 

t{x) = 

in which x is the pixel coordinates, T(x) is the pixel value 
sensed by the camera, Lo(x) is the actual luminance of the 
scene, t(x) is called transmission matrix, d(x) is the depth 
map of the scene, k controls the haze intensity, and Ls(x) 
denotes the lighting condition, e.g., sky luminance. In this 
paper, we are interested in estimate the haze level, i.e., the 
k value. As illustrated in Fig.[3l a larger k indicates heavier 
haze. 

3.2 Estimate Transmission 

Based on an effective Dark Channel Prior, [3] proposed 
to estimate the transmission matrix t(x) using the following 


equation, 

t(x) = l-a;min min (2) 

in which c denotes the color channels, e.g., RGB, cc controls 
the amount of haze to preserve to make the hnal dehazed 
photo look natural and is empirically fixed at 0.95, r2(x) 
denotes an image patch centered at x and the patch size is 
fixed at 15, T^(y) is the pixel value of channel c at y, and 
denotes the estimated sky luminance. We follow the same 
algorithm as [4] to estimate and hx it for each channel 

and image. Note that Eqn. can be easily implemented 

using elementwise operations and an image erosion. 

After the rough estimation in Eqn. a soft matting or 
more efficiently guided filtering [5] is applied to refine the 
transmission matrix. Given the guided filter is becoming a 
standard operation, we simply denote the refining process 
as the following, 

t(x) = GuidedEilter(t(x), L(x), VF), (3) 

in which VF, the window size, is a parameter for the guided 
hlter and is empirically fixed at 60. 

3.3 Depth Estimation 

Given the transmission matrix t(x), the value k in Eqn. ([1]) 
can be computed directly if the depth map d(x) is known. 
Therefore, we propose to use a standalone image depth es¬ 
timator to remove the effect of d(x) in t(x). We adopt the 
Deep Gonvolutional Neural Eields (DGNE) proposed in [7] 
for depth estimation. DGNE estimates depth using image 
by inference from a learned GRE over superpixels, and ob¬ 
jective function of the GRE is a combination of the unary 
and pairwise potentials as follows, 

S(y,x;6»,/3) = [/(i/p,x;6») + V{yp,yq,x-,/i), (4) 

peAf {p,q)es 

where Af is the set of superpixels, S is the set of neighbor¬ 
hood superpixel pairs, U (*) is the unary potential parame¬ 
terized by a multi-layer Gonvolutional Neural Network over 
the pixel values and 0 is its network parameters, and E(*) is 
the pairwise potential parameterized by a single layer Neu¬ 
ral Network over a set of similarity measurements, e.g., color 
histogram and LBP similarity [7]. The model parameters (6 
and /3) are learned using a standard dataset and we use the 
model trained from the Make3D dataset m for our outdoor 
case. 

3.4 Haze Estimate 

Given the transmission matrix t(x) and depth map d(x), 
it becomes straightforward to estimate the haze level k ac¬ 
cording to Eqn. O- However, given the scaling issues and 
the fact that while there is only a single haze level k for each 
image, t(x) and d(x) is computed for each pixel, the interac¬ 
tions among these quantities are complicated. We propose 
to select from a large pool of combinations of transformation 
and pooling functions, denoted as follows, 

fc = P{C[T‘(i(x)),T^(d(x))]}, (5) 

where T^(*) and T^(*) are the transformation functions, e.g, 
log^ over the transmission matrix and depth map, respec¬ 
tively. C[*] is a bivariate function, e.g., division^ to combine 
the matrices, and P{*} is a pooling function, e.g., max, to 
aggregate the matrix to a single value. We will explain the 
choices of these functions in the Experiments section. 




Figure 2: Example scene from the FRIDA dataset m- The images are the original image, 4 types of haze conditions and the 
depth map, respectively. 


4. EXPERIMENTS 

In this section, we present our experiments to validate 
the proposed method. We first present the synthetic and 
real image datasets. Then, we describe the baselines for 
comparison. Next, we present the comparison results, which 
demonstrate the effectiveness of the proposed method. 

4.1 Datasets 

We experiment on both synthetic and real images. 
FRIDA is a synthetic haze image dataset serving as a bench¬ 
mark for haze removal related research. FRIDAl contains 
90 synthetic images of 18 urban road scenes [13]. FRIDA2 
contains 330 synthetic images of 66 various road scenes HU. 
Both FRIDAl and FRIDA2 are generated artificially using 
the same algorithm. 1) A scene, together with its depth 
map, is generated using a computer software. 2) Given the 
depth map, 4 types of haze conditions are applied to the 
generated image (CGI) according to the model in Eqn. ([1]). 
An example of image, its depth map and the haze applied 
images are shown in Fig. [J] The k value in Eqn. o is fixed 
for the released images, which is suitable for haze detection 
and removal, but not for the haze level estimation. We re¬ 
produce the synthetic algorithms using the provided original 
CGI and depths, but with varying k value to simulate vari¬ 
ous haze level. Together with 4 types of haze conditions and 
9 haze levels, we generate 36 haze images for each scene, so 
together with the original images, there are 666 images in 
FRIDAl and 2437 images in FRIDA2. The effects of larger 
k is illustrated in Fig. (3] 

PM25 is the real image dataset we crawled from a tourist 
website 0. The photos in this dataset were taken at various 
attraction sites in the Beijing city, and the timestamps was 
recorded. We then associate these photos with the hourR 
PM2.5 records sensed by the U.S. Embassy in Beijing [j. 
There are a total of 8,761 photos with associated PM2.5 
records in this dataset. We use PM2.5 as a proxy for the haze 
level k to evaluate the proposed method. Because of mea¬ 
surement errors and other factors involved in PM2.5 records, 
there are noises in using PM2.5 as a proxy of the haze level. 
Therefore, we select 46 photos that are manually categorized 
to NonHaze, LightHaze and Heavy Haze, and we use 0, 1 and 
2 as the proxy of the haze level k for each of the categories. 
There are 22, 14 and 10 photos for each of the categories, re¬ 
spectively. We refer to the full set as PM25 and the subset 
as PM25-S. 

4.2 Evaluation Protocols 

We compare the proposed method with various baselines 
and a previous work proposed method in [6], in order to 
show that the combination of transmission matrix and depth 
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Figure 3: Varying k in Eqn. JT]) for the image in Fig. (2] The 
haze level increases with increasing k values. 


map achieves superior performance. The baselines trans and 
depth are the methods that use only a single factor, i.e., 
the transmission matrix and the depth map, respectively. 
depthGtrans is the proposed method that combines both 
factors. jcsb2014 is the statistical method proposed in [6], 
which is the only previous work we found in the literature 
that dealt with haze level estimation. 

Different choices of the transmission matrix (raw and re¬ 
fined in Eqn. (|2|) (|3|)) and the functions in Eqn. © con¬ 
tribute to the pool of all possible variations of the pro¬ 
posed method and the baselines. The choices of functions in 
Eqn. dS} are based on our observations on Eqn. © and are 
listed as follows, 

• Transformation function T(x): log(x + 1), log (log (x + 
1) + 1) and the unit transformation T(x) = x. 

• Bivariate function C\t,d]: t ^ d, t/d and d/t. Also, 
C[t,d] is t and d in the baseline trans and depth, re¬ 
spectively. 

• The pooling function P[M]: mean, median, max, 75^^ 
percentile and 90^^ percentile. 

There are 991 variations of the methods, and we report 
the best result for each estimation model. Because of the 
ordinal natural of the (proxy) haze level, we consider the 
Spearman correlation coefficients as the evaluation metric 
to compare different methods. In addition, the sign of the 
correlation is irrelevant in the comparison, thus we use the 
absolute value of the correlation as the final performance 
metric. 

In addition, because all of the methods contain the single 
feature and no parameter fitting is involved, we do not need 
to use the standard practice to cross validate the methods. 

4.3 Results 

The evaluation results are shown in Table ^ from which 
we make the following observations: 

• All methods perform very well on the synthetic image 
dataset, which means all methods, including the pro¬ 
posed method, baselines and the one proposed in [6], 
are able to capture the haze level to some extent. 

• The proposed method and baselines perform better 
than the jcsb2014 work. The gain becomes more sig¬ 
nificant when the scenes and haze conditions are more 





% 

FRIDAl 

FRIDA2 

PM25 

PM25-S 

Jcsb2014 [6] 

77.34 

77.44 

3.95 

N/A 

depth 

76.74 

53.47 

25.32 

70.14 

trans 

85.56 

87.38 

28.10 

84.32 

depthGtrans 

90.60 

87.43 

40.83 

89.05 


Table 1: Absolute Spearman correlation coefficients (%) per¬ 
formance. First row show the datasets, first column show 
the methods and depth(8)trans is the proposed method. All 
shown values have a p-value smaller than 0.001. jcsb2014 
for PM25-S is N/A, because the p-value is 0.3781. 

complicated. See the example photos from different 
datasets in Fig. [S] 

• The proposed method, combining depth and transmis¬ 
sion, are better than the baselines, using single factors, 
especially on the really difficult PM25 dataset. This 
indicates that it is important to consider transmission 
and depth together. Neither factor alone can correlate 
well with the haze level. 

• The proposed method can achieve very high correla¬ 
tion on the manually labeled real images PM25-S, but 
still not very high on the full PM25 dataset, which in¬ 
dicates there are noises using only the PM25 value as 
a proxy of haze level. In other words, the estimate can 
only explain 40% of the variation. 

In order to further validate the correlations and show 
the scale of the dataset, the predicated haze level and the 
ground truth is plotted on Fig. [H for the best depth(8)trans 
option for each dataset. By simple calibration, we can esti¬ 
mate the haze condition into three levels: Clear^ Light and 
Heavy. In Fig. O we show examples of the prediction re¬ 
sults on all three datasets. While the results illustrated in 
Fig. [5] are very promising, we can observe following error 
patterns: 1) The uniform sky luminance assumption is vio¬ 
lated. 2) Single big object occupy in the photo failing the 
depth estimator. 3) The ground truth label is wrong. 

5. CONCLUSIONS 

We have proposed an effective method to estimate haze 
level from single images. The input image is first fed into a 
haze removal algorithm to generate the transmission matrix, 
and the depth map is also estimated from the pixels. By re¬ 
moving the effects of depth, we estimate the haze level from 
the transmission matrix. Using a GPU backend of the Deep 
Convolutional Neural Fields, the whole processing time for 
one image is less than one second. The superior performance 
of combining the transmission matrix and depth map is val¬ 
idated by the experiment results of Spearman correlation 
between the estimated haze level and ground truth on both 
synthetic and real image datasets. The results on real image 
dataset need further research to make large scale monitor¬ 
ing based on online user photos more reliable, e.g, defining 
a better proxy for the ground truth haze level. In order to 
encourage future research, we will release datasets online 0 - 
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Figure 4: The ground truth and predicted haze level for 
the depthGtrans on each dataset. The points are the data 
instances and the line is a fitted trendline to illustrate the 
correlation. 

6. REFERENCES 

[1] P. M. Aoki, R. J. Honicky, A. Mainwaring, C. Myers, 

E. Paulos, S. Subramanian, and A. Woodruff. A vehicle for 
research: Using street sweepers to explore the landscape of 
environmental community action. SIGCHI ’09. ACM.n 

[2] D. J. Best and D. E. Roberts. Algorithm as 89: The upper 
tail probabilities of spearman’s rho. Journal of the Royal 
Statistical Society. Series C (Applied Statistics), 24(3):pp. 
377-379, 1975. n 

[3] J. Chen, H. Chen, C. Zheng, J. Z. Pan, H. Wu, and 
N. Zhang. Big smog meets web science: Smog disaster 
analysis based on social media and device data on the web. 
WWW Companion ’14, 2014. □ 

[4] K. He, J. Sun, and X. Tang. Single image haze removal 
using dark channel prior. PAMI, 33(12), Dec 2011. □ 

[5] K. He, J. Sun, and X. Tang. Cuided image filtering. PAMI, 
35(6), June 2013. □ 

[6] S. W. Jun Mao, Uthai Phommasak and H. Shioya. 
Detecting foggy images and estimating the haze degree 
factor. Journal of Computer Science & Systems Biology, 
7(6):226-228, 2014. „ 

[7] F. Liu, C. Shen, C. Lin, and 1. Reid. Learning depth from 
single monocular images using deep convolutional neural 
fields. Technical report, University of Adelaide, 2015. □ 

[8] S. Mei, H. Li, J. Fan, X. Zhu, and C. Dyer. Inferring air 
pollution by sniffing social media. In Advances in Social 
Networks Analysis and Mining (ASONAM), 2014 
IEEE/ACM International Conference on, Aug 2014. □ 

[9] S. Poduri, A. Nimkar, and C. S. Sukhatme. Visibility 
monitoring using mobile phones. Annual Report: Center 
for Embedded Networked Sensing, pages 125-127, 2010. □ 

[10] O. Raaschou-Nielsen, Z. J. Andersen, R. Beelen, E. Samoli, 
M. Stafoggia, C. Weinmayr, B. Hoffmann, P. Fischer, M. J. 
Nieuwenhuijsen, B. Brunekreef, et al. Air pollution and 
lung cancer incidence in 17 european cohorts: prospective 
analyses from the european study of cohorts for air 
pollution effects (escape). The lancet oncology, 
14(9):813-822, 2013. □ 

[11] A. Saxena, M. Sun, and A. Ng. Make3d: Learning 3d scene 
structure from a single still image. PAMI, 31(5):824-840, 
May 2009. □ 

[12] J.-P. Tarel, N. Hautiere, L. Caraffa, A. Cord, H. Halmaoui, 
and D. Cruyer. Vision enhancement in homogeneous and 
heterogeneous fog. Intelligent Transportation Systems 
Magazine, IEEE, 4(2):6-20, Summer 2012. □ 

[13] J.-P. Tarel, N. HautielAre, A. Cord, D. Gruyer, and 
H. Halmaoui. Improved visibility of road scene images 
under heterogeneous fog. In Intelligent Vehicles Symposium 
(IV), 2010 IEEE, pages 478-485, June 2010. □ 


















Figure 5: The examples of photos from the datasets at different haze level, alone with its the prediction results. The rows are 
from dataset PM25, PM25, PM25, PM25-S, PM25-S, PM25-S, FRIDAl and FRIDA2, respectively. The prediction 
errors are highlighted with thick green borders and they are analysed in the end of the Experiment section. 


GT: Clear 
Pred: Clear 



GT: Heavy 
Pred: Light 



GT: Heavy 
Pred: Heavy 



GT: Clear 
^lear 


GT: Clear 
Pred: Clear 



GT: Heavy 
Pred: Heavy 



GT: Clear 
^Pred: Clear 



GT: Heavy 
Pred: Heavy 



GT: Ckiff ^ 
Pred: Light 



GT: Light GT: Heavy 

Pred: Light . | Pred: Heavy 





GT: Heavy 
Pred: Heavv 




GT: Heavy 
Pred: Heavy 




























































